"Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases
نویسندگان
چکیده
The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.
منابع مشابه
Secure Regression on Distributed Databases
This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowe...
متن کاملSecure analysis of distributed chemical databases without data integration
We present a method for performing statistically valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multi-party computation to share local sufficient statistics necessary to compute least squares estimators of regression coefficients, error variances and other quantities of interest. We illustrate ...
متن کاملSecure Regression for Vertically Partitioned, Partially Overlapping Data
We consider the setting where multiple parties with different variables and units seek to combine their data to fit regressions but are not willing or not allowed to share their data values. We present a general strategy to tackle such problems by treating them as missing data problems, and we estimate regression coefficients using secure EM algorithms. We present secure EM algorithms for linea...
متن کاملRegression on Distributed Databases via Secure Multi-Party Computation
We present a method for performing linear regression on the union of distributed databases that does not entail constructing an integrated database, and therefore preserves confidentiality of the individual databases. The method can be used by statistical agencies to share information from their individual databases, or to make such information available to others.
متن کاملA secure distributed logistic regression protocol for the detection of rare adverse drug events
BACKGROUND There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from diffe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006